Skip to content

smartcontract,controller: RFC-18 devnet fixes — node segment assignment and community stamping#3648

Merged
elitegreg merged 5 commits intomainfrom
bc/rfc18-devnet-fixes
May 8, 2026
Merged

smartcontract,controller: RFC-18 devnet fixes — node segment assignment and community stamping#3648
elitegreg merged 5 commits intomainfrom
bc/rfc18-devnet-fixes

Conversation

@ben-malbeclabs
Copy link
Copy Markdown
Contributor

@ben-malbeclabs ben-malbeclabs commented May 1, 2026

Summary

  • Atomic flex-algo node segment allocation on CreateDeviceInterface: Vpnv4 loopbacks under onchain allocation now allocate a FlexAlgoNodeSegment for each existing topology in the same transaction. New/reprovisioned devices no longer need a separate AssignTopologyNodeSegments step. The CLI auto-discovers topologies and passes them.
  • New UpdateDeviceInterface(update_topologies=true) reconciliation path: deallocate removed topologies, allocate added ones, preserve unchanged. Authorized for the contributor owner or the foundation allowlist.
  • DeleteDeviceInterface now also deallocates the loopback's flex-algo node segments under onchain deallocation.
  • Controller: stamp the default UNICAST-DEFAULT topology color for users whose access pass has no tenant.
  • Mechanical rename BackfillTopologyAssignTopologyNodeSegments across the program, CLI, and Rust SDK (instruction discriminant 110 unchanged).

Context

Two issues discovered during the RFC-18 devnet deployment:

1. Missing flex-algo node segments on new devices. The activator used to call BackfillTopology after activating a VPNv4 loopback, assigning FlexAlgoNodeSegment entries for each existing topology. The activator was removed (#3607), and this responsibility was never ported. New or reprovisioned devices (e.g. dzd4 via the daily provisioning test) were missing node segments, causing the controller to log flex_algo.enabled=true but VPNv4 loopback has no flex_algo_node_segments.

The fix extends the onchain CreateDeviceInterface instruction to accept optional topology PDA accounts and allocate one FlexAlgoNodeSegment per topology atomically. UpdateDeviceInterface gains an update_topologies mode for ongoing reconciliation when topologies are added/removed after device creation, and DeleteDeviceInterface deallocates the segments alongside the base loopback SR ID. Topology accounts are validated by program-owner and by the first byte (AccountType::Topology) — a cheap check that prevents passing arbitrary program-owned accounts.

2. Community stamping skipping tenantless users. resolveTenantColors was only called inside the tenant lookup branch. All 985 unicast users on mainnet and 474/535 on testnet have no tenant, so they were silently getting no color. The fix adds an else branch that resolves the default UNICAST-DEFAULT color for tenantless users.

Related RFC: rfcs/rfc-0018-flex-algo.md

Testing Verification

  • cargo test -p doublezero-serviceability — full suite passes, including new test_update_topologies_* cases (adds/removes/clear/full-swap/no-op/rejects-non-vpnv4/rejects-duplicates) and test_update_topologies_allowed_for_contributor which verifies a non-foundation contributor-owner key can drive update_topologies
  • cargo test -p doublezero_cli device::interface::getflex_algo_node_segments now render as topology_name:sr_id rows; list_topology is fetched once and cached per get
  • Verified mainnet has 0 users with tenants, testnet has 40/535 — confirming the community stamping gap

@ben-malbeclabs ben-malbeclabs self-assigned this May 1, 2026
@ben-malbeclabs ben-malbeclabs changed the title controller: stamp default topology color for tenantless users smartcontract,controller: RFC-18 devnet fixes — node segment assignment and community stamping May 1, 2026
@ben-malbeclabs ben-malbeclabs force-pushed the bc/rfc18-devnet-fixes branch from 2b95b4c to f15af55 Compare May 1, 2026 23:18
@ben-malbeclabs ben-malbeclabs requested a review from elitegreg May 2, 2026 01:41
@ben-malbeclabs ben-malbeclabs marked this pull request as ready for review May 2, 2026 01:48
Copy link
Copy Markdown
Contributor

@elitegreg elitegreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution pretty much requires that you never have more than about single digit or so number of topologies. As written today, it could be up to 25, but that assumes there is room in the packet, which is why I'm saying 8 or so. Is this a reasonable limitation? It seems to me like it is not as we may have several topologies per tenant?

@ben-malbeclabs
Copy link
Copy Markdown
Contributor Author

ben-malbeclabs commented May 2, 2026

This solution pretty much requires that you never have more than about single digit or so number of topologies. As written today, it could be up to 25, but that assumes there is room in the packet, which is why I'm saying 8 or so. Is this a reasonable limitation? It seems to me like it is not as we may have several topologies per tenant?

It would be a decent limit. I can't imagine us having that many.

I think other technologies become more interesting for us without have 10s of Flex-Algo topologies.

Worst case, if we did need more than 8, or even more than 25, is that difficult to achieve?

@elitegreg
Copy link
Copy Markdown
Contributor

This solution pretty much requires that you never have more than about single digit or so number of topologies. As written today, it could be up to 25, but that assumes there is room in the packet, which is why I'm saying 8 or so. Is this a reasonable limitation? It seems to me like it is not as we may have several topologies per tenant?

It would be a decent limit. I can't imagine us having that many.

I think other technologies become more interesting for us without have 10s of Flex-Algo topologies.

Worst case, if we did need more than 8, or even more than 25, is that difficult to achieve?

Difficult to achieve? Probably what we would need to do is come up with a more flexible implementation and create new version of the create interface command and retire the old one (this latter case is a big deal, with the activator instructions being the first instructions we are actually retiring). I think I would feel better if we had an instruction for adding a flex algo to an interface, and the having the interface create sdk (client side) call create interface, followed by iteratively adding the flex algo (each would allocate a segment routing id). My preference would be to do it this way now and not bork up create interface in a way that may only be temporary.

ben-malbeclabs and others added 4 commits May 8, 2026 20:01
Community stamping only resolved colors for users with a tenant
assigned. Since most users (all of mainnet, most of testnet) have no
tenant, they were silently skipped. Add an else branch that calls
resolveTenantColors with nil includeTopologies, which falls through to
the default UNICAST-DEFAULT color.
When creating a VPNv4 loopback with onchain allocation, the instruction
now accepts optional topology PDA accounts and allocates a
FlexAlgoNodeSegment for each. This replaces the activator's post-
activation backfill that was lost during the activator removal.

The CLI automatically discovers existing topologies and passes them to
the instruction, so `doublezero device interface create ... --loopback-type
vpn-ipv4` assigns all node segments atomically — no manual steps.

Also rename BackfillTopology → AssignTopologyNodeSegments across the
codebase (instruction discriminant 110 unchanged).
@elitegreg elitegreg force-pushed the bc/rfc18-devnet-fixes branch from f15af55 to 99468eb Compare May 8, 2026 21:09
Comment thread smartcontract/cli/src/device/interface/get.rs Outdated
- cli/device-interface-get: cache list_topology in a HashMap so we
  don't refetch per flex-algo segment
- create/update: validate topology accounts by first-byte
  AccountType::Topology in addition to program-owner
- update: allow contributor.owner to drive update_topologies, not just
  the foundation allowlist
- update: size kept Vec by max(existing, desired) so we don't realloc
  when the desired set grows
- tests: add test_update_topologies_allowed_for_contributor exercising
  the contributor-owner authorization branch
- changelog: add Unreleased entries covering this PR's smartcontract,
  controller, and CLI changes
@elitegreg
Copy link
Copy Markdown
Contributor

This solution pretty much requires that you never have more than about single digit or so number of topologies. As written today, it could be up to 25, but that assumes there is room in the packet, which is why I'm saying 8 or so. Is this a reasonable limitation? It seems to me like it is not as we may have several topologies per tenant?

It would be a decent limit. I can't imagine us having that many.
I think other technologies become more interesting for us without have 10s of Flex-Algo topologies.
Worst case, if we did need more than 8, or even more than 25, is that difficult to achieve?

Difficult to achieve? Probably what we would need to do is come up with a more flexible implementation and create new version of the create interface command and retire the old one (this latter case is a big deal, with the activator instructions being the first instructions we are actually retiring). I think I would feel better if we had an instruction for adding a flex algo to an interface, and the having the interface create sdk (client side) call create interface, followed by iteratively adding the flex algo (each would allocate a segment routing id). My preference would be to do it this way now and not bork up create interface in a way that may only be temporary.

I have decided Ben's way of implementing this is fine for now. If we hit a limit for account arguments, we can add another instruction to add more topologies.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses RFC-18 devnet rollout gaps by making flex-algo node-segment allocation/reconciliation atomic and automatic for VPNv4 loopbacks, and by ensuring tenantless users still receive default topology color stamping in the controller. It also performs a mechanical rename of the topology “backfill” instruction/command to “assign node segments” while keeping the instruction discriminant stable.

Changes:

  • Smartcontract: extend CreateDeviceInterface / UpdateDeviceInterface / DeleteDeviceInterface to allocate, reconcile, and deallocate FlexAlgoNodeSegment SR IDs under onchain allocation.
  • Controller: resolve/stamp default UNICAST-DEFAULT topology color communities for tenantless users when FlexAlgo is enabled.
  • SDK/CLI: rename BackfillTopologyAssignTopologyNodeSegments, update commands/tests, and enhance device interface get output to display flex-algo segments.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
smartcontract/sdk/rs/src/commands/topology/mod.rs Switch topology command module export to assign_node_segments.
smartcontract/sdk/rs/src/commands/topology/create.rs Update topology creation flow/tests to use AssignTopologyNodeSegments.
smartcontract/sdk/rs/src/commands/topology/assign_node_segments.rs Rename command + wire instruction to AssignTopologyNodeSegments.
smartcontract/sdk/rs/src/commands/device/interface/update.rs Add optional topology reconciliation support (topology PDA metas + args).
smartcontract/sdk/rs/src/commands/device/interface/create.rs Add optional topology PDA metas/count for atomic flex-algo segment allocation.
smartcontract/programs/doublezero-telemetry/tests/test_helpers.rs Update test helper args for new topology_count field.
smartcontract/programs/doublezero-serviceability/tests/unlink_device_interface_test.rs Update interface create args for new topology_count field.
smartcontract/programs/doublezero-serviceability/tests/topology_test.rs Rename Backfill → AssignNodeSegments and update args/imports.
smartcontract/programs/doublezero-serviceability/tests/resource_extension_test.rs Update interface create args for new topology_count field.
smartcontract/programs/doublezero-serviceability/tests/link_wan_test.rs Update interface create/update args for new topology reconciliation fields.
smartcontract/programs/doublezero-serviceability/tests/link_onchain_allocation_test.rs Update interface create args for new topology_count field.
smartcontract/programs/doublezero-serviceability/tests/link_dzx_test.rs Update interface create args for new topology_count field.
smartcontract/programs/doublezero-serviceability/tests/interface_test.rs Update interface create/update args for new topology fields.
smartcontract/programs/doublezero-serviceability/tests/interface_onchain_allocation_test.rs Add extensive flex-algo create/delete/update-topologies test coverage and update args.
smartcontract/programs/doublezero-serviceability/tests/global_test.rs Update interface create args for new topology_count field.
smartcontract/programs/doublezero-serviceability/src/processors/topology/mod.rs Switch topology processor module export to assign_node_segments.
smartcontract/programs/doublezero-serviceability/src/processors/topology/assign_node_segments.rs Rename processor/args and keep a compatibility alias for old args name.
smartcontract/programs/doublezero-serviceability/src/processors/device/interface/update.rs Implement update_topologies reconciliation logic (allocate/deallocate flex-algo SR IDs).
smartcontract/programs/doublezero-serviceability/src/processors/device/interface/delete.rs Deallocate flex-algo SR IDs on interface delete (onchain deallocation).
smartcontract/programs/doublezero-serviceability/src/processors/device/interface/create.rs Allocate flex-algo SR IDs during atomic VPNv4 loopback creation.
smartcontract/programs/doublezero-serviceability/src/instructions.rs Rename instruction enum variant to AssignTopologyNodeSegments (discriminant 110 retained).
smartcontract/programs/doublezero-serviceability/src/entrypoint.rs Route instruction 110 to the renamed processor entrypoint.
smartcontract/cli/src/topology/mod.rs Switch CLI topology module export to assign_node_segments.
smartcontract/cli/src/topology/assign_node_segments.rs Rename CLI command and update output/tests.
smartcontract/cli/src/doublezerocommand.rs Rename CLI trait method from backfill to assign node segments.
smartcontract/cli/src/device/interface/update.rs Add --topologies flag and plumb through to SDK update command.
smartcontract/cli/src/device/interface/get.rs Display flex_algo_node_segments as topology_name:sr_id rows with topology lookup.
smartcontract/cli/src/device/interface/create.rs Auto-discover existing topologies for VPNv4 loopback create to enable atomic assignment.
controlplane/doublezero-admin/src/cli/migrate.rs Update migration command to call AssignTopologyNodeSegments API.
controlplane/controller/internal/controller/server.go Apply default topology color stamping for tenantless users when FlexAlgo enabled.
client/doublezero/src/main.rs Rename CLI subcommand dispatch Backfill → AssignNodeSegments.
client/doublezero/src/cli/link.rs Rename CLI topology subcommand Backfill → AssignNodeSegments.
CHANGELOG.md Document renamed instruction and new atomic create/update/delete flex-algo behaviors + controller stamping fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread smartcontract/sdk/rs/src/commands/device/interface/create.rs
Comment thread smartcontract/sdk/rs/src/commands/device/interface/update.rs
Comment thread smartcontract/cli/src/device/interface/update.rs
Comment thread smartcontract/cli/src/device/interface/get.rs
@elitegreg elitegreg enabled auto-merge (squash) May 8, 2026 22:35
@elitegreg elitegreg merged commit ec61df1 into main May 8, 2026
37 checks passed
@elitegreg elitegreg deleted the bc/rfc18-devnet-fixes branch May 8, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants